MACRO.TXT[VLI,LSP] - www.SailDart.org

perm filename MACRO.TXT[VLI,LSP] blob sn#382120 filedate 1978-09-14 generic text, type T, neo UTF8
.page
The Macro Instruction Architecture
	
	Lisp, while not unique in this respect, has a long history of using multiple
data representations for programs.  Viewed initially as a strictly interpretive system,
the desire for faster performance rapidly inspired development of increasingly
sophisticated compilers.  Although the ability to run interpreted code remains an
important language feature, today almost all serious work is done with compiled lisp
programs. 
	The lisp machine system offers two improvements over existing lisp
implementations in the area of program representation.  First, we had the opportunity to
define for ourselves a computer architecture well suited to the lisp environment.  The
goals of this architecture are high bit efficiency in the representation of compiled
instructions, and ease of the compiler construction.  Implementation of these
requirements led to an instruction set that, although conceptually simple, is rather
complex in implementation.  This led naturally to the choice of a microcoded
architecture. 
	Second, the existence of the microprogrammable processor provided the
opportunity to execute user functions in another way - by compiling the functions into
the microcode of the machine. 
	We thus provide the lisp user with three levels of program execution, each
optimized for a different task.  Interpreted execution provides the ability for programs
to construct, modify, and analyze other programs, an ability almost unique to lisp and
likely to become more important with the development of more sophisticated program
analyzing software.  The traditional role of interpreted execution as a simple debugging
technique is largely supplanted in the lisp machine environment by the fast compiler and
excellent compiled code run time error checking. 
	A second execution mode found in the lisp machine system is traditional 
compiled code.  Here, the user function is transformed into a bit-efficient and more
quickly interpretable form we call →macro-code_.  Emphasis in this process is on the bit
efficiency of the resulting instructions, since it is anticipated that the vast majority
of system software will be executed in this form.  The high bit effciency of this format
reduces program size, resulting in smaller working sets, a smaller requirement for main
memory, and less swapping bandwidth from secondary storage. 
	The third execution mode results from executing user functions as microcoded 
processor subroutines.  In this mode, emphasis is on fast execution, rather than bit
efficiency.  The micro-compiled functions occupy a limited and expensive resource, the
processor micro-code storage.  Providing this quick access to the basic hardware of the
machine is a goal which argued strongly for a single user machine -- it is very hard to
protect users from one another in an environment allowing alterable microcode, not to
mention the difficulties of allocating the resource effectively. 
	Next, we will examine in some detail the format of the macro instruction set,
and the format of the data upon which it operates.  The goal is not a complete
understanding of the software basis of the lisp machine system, which is far beyond the
scope of this report, but rather an appreciation for the type and complexity of the
operations which the microprocessor must support. 

.ce
The Macro Architecture

→Typed Data_
	The macro instruction set implements high level functions operating primarily
on typed data of a format shown below:




Each 32 bit word, which we term a "Q", is capable of holding one 24 bit pointer
to any location in the virtual address space of the machine.  A full lisp node
containing two pointers thus normally requires two sequentially located Q's.
The first holds a pointer to the CAR of the lisp node, and the second, a pointer to
the CDR.





→CDR Codes_
	By means of the two bit CDR CODE field in the Q, however, this storage requirement
can normally be halved.  The CDR CODE allows a Q to specify that the next sequential location
is not a pointer to CDR of the node, but rather that it →is_ the CDR of the node.
A third state of the CDR CODE field specifies that the CDR of this node is 
NIL, so that the end of a list may be represented without an additional
word.  Thus, the two bit CDR CODE field is decoded into three useful states:

	CDR-NORMAL	(the next word is the second half of a full node)
	CDR-NEXT	(the next word is the CDR of this node)
	CDR-NIL		(the CDR of this node is nil)

	The combination of these codes allows representation of single
level lists, such as (A B C D) as four sequential Q's, rather than as
four two Q full nodes.




→The Datatype Field_
	The five bit datatype field found in each Q is used to determine the
representation and semantics of the data being referenced.  Normal lisp list structure
contains datatype LIST, indicating it has a lisp node representation such as that given
above.  Many other datatypes exist in the system, however.  Integers of less than 24
bits, for example, are represented with datatype FIXNUM and the binary number stored in
the 24 bit pointer field.  Arrays, strings, and other number representations, such as
floating point and so-called BIGNUM integers (larger than 24 bits), have more complex
representations, which we will not discuss here, except to mention their existance and
that they are represented with distinct datatypes. 
	The datatype field is also useful in implementation of system storage
conventions, as exemplified in the various types of so-called →forwarding_ or →invisible_
pointers.  They act, as a class, somewhat as does an indirect address reference in a
traditonal instruction set architecture, in that they do not contain the data being
referenced, but rather a pointer to that data.  The important difference is
that the forwarding pointer datatype is specified in the data,  rather than in the
instruction.  Several forms of the forwarding pointer exist in the machine, differing
in the conditions under which the indirection is performed.  A good example of
the use of this class of datatypes is the GC-FORWARDING datatype.  When the garbage
collector relocates an object, it replaces the original object data with a GC-FORWARDING
pointer to the new location of the object.  References which still point to the old
location of the relocated data will encounter this forwarding datatype and follow the
indirect, eventually locating the correct data.  In the case of GC-FORWARDING  pointers,
the forwarding pointer is spliced out of the reference path by changing the original
pointer to reflect the  relocated position of the data. 

→The Macro Instruction Set_
	We now turn to a description of the sixteen bit instructions which operate on
this typed data.  Each instruction is divided into four fields, which are used
differently in the four main instruction classes.  The  instruction class is specified
in the high order four bits, which are decoded sixteen ways.  Nine of these are
address/destination instructions, and specify the operation directly in this field.  Three
of the sixteen combinations are decoded further with the destination field, and form the
address only instruction class.  One decodes the address field, providing 512 operations
in the destination only class, and finally, one is decoded to provide the branch class. 
The branch class decodes the destination field as a condition specifier, and uses the
address field as a relative program counter increment. 

<diagram of sixteen bit field layout>

→Calculation of the Effective Address_
	The effective address for source/destination and source-only instructions is
specified by the nine bit source address field.  Three of these bits are used to select
one of eight base registers; the remaining six bits are used as a positive displacement
from the register.  The registers are automatically updated by the processor when changes
to the execution environment occur, such as during a function call.  Four of the
registers point to portions of the currently executing function entry frame, with
their contents offset by 64 locations, such that a total of 256 locations may be referenced.
One points to the origin of the argument list for the currently executing function,
another to the local binding block for this function, and  a third to a
set of quoted constants, such as T and NIL.  Finally, one state of the base register field
is used to indicate special references.  The six bit destination field is then used
to specify the action.  At the present time, only one state of this field is used,
indicating that the addressed value is on the top of the stack, and that the stack pointer
should be decremented after this reference.
	The 256 locations addressable in the function entry frame often contain
forwarding pointers to items of interest to the function.  This is the way in which
pointers to other called functions, for example, are stored.
	A small class of instructions, such as MOVEM and POP (see below) use the
source address to specify a location for writing;  others modify the contents.  The
address is calculated identically in any case.

→Source/Destination Instructions_
	The nine source/destination instructions consist of one data moving operation
[MOVE], six pointer following operations [CAR,CDR,CAAR,CDAR,CADR,CDDR] and two
function calling instructions [CALL,CALL0].  Each instruction calculates its
effective address from the source field, performs its operation on the data fetched,
and disposes of its result as specified in the three bit destination field (see below).
The CALL and CALL0 instructions behave differently, as we will see below in the discussion
of the inverted calling sequence.

→The Destination Field_
	The three bit destination field specifies the dispostion of results from
an instruction.  The IGNORE code permits the macro instruction set to discard unneeded
results, while still setting the condition code bits used for conditional branches.
A destination of STACK permits pushing the result data onto the push down list.
NEXT-LIST allows, in combination with the setup instruction LIST, a convenient mechanism
for compiling the construction of lists. The LIST-N instruction allocates N sequential
Q's from free storage and pushes its destination code and two pointers to the beginning
of the allocated area.  A destination field of NEXT-LIST stores the data in CAR of
what is pointed to by the top of the stack.  It then replaces the top of the stack with
its CDR, pointing now to the next location to be filled.  If the result of this CDR is
NIL, the list is full, and the other pointer to the allocated area is removed from the
stack, and stored as specified by the destination code of the original LIST-N instruction.
	The remaining destination field specifiers combine with the CALL and CALL0
operations to form the function calling and returning mechanism of the macro order code.
	A function call with arguments is initiated by the CALL instruction.  It
specifies a pointer to the function cell of the called function in the address field.
In the destination field, it specifies what is done with the →value returned by this
function call_.  Execution of the call instruction does →not_ result in transfer
of control to the function -- it merely sets up internal pointers and in effect
prepares the machine for an eventual function call.  The CALL instruction is
typically followed by a series of macro instructions with a special destination code
we call NEXT.  It specifies that the result of this instruction is the next argument
to the function which is about to be called, and causes it to be added to the
argument list of that function calling frame.  When a macro instruction with destination
code LAST is finally executed, it stores the final argument to the function and initiates
the actual transfer of control, base register modifications, and binding changes 
necessary to implement the call.
	When the called function eventually encounters a macro instruction with the
destination code RETURN it restores registers, unbinds variables, and fetches from
the stack the stored destination code of the CALL instruction which originally
set up its function call.  It then uses that destination code to dispose of the
returned function value as would any other macro instruction.
	This entire process of function calling is termed the →inverted calling
sequence_ because the function to be called is specified prior to the evaluation
of any of its arguments.
	Functions of no argument are called with the CALL0 macro instruction, since
there is no opportunity to terminate the actual function call with a destination
code of LAST, and the function must be called immediately.
	The utility of the inverted calling sequence arises from the (possible) need
for the macro instruction set to behave differently in storing function arguments
depending upon the level of execution of the function about to be called (macro, micro,
interpreted, etc.).  By providing the information concerning the function prior
to the evaluation of its arguments, special action, if needed, can take place during
storage of the arguments.
Destination Only Instructions

	The destination only instruction class has no effective address field.
Instead, it allocates the nine bit field to further specify the one of 512 instructions.
These instructions take arguments only on the stack, and provide the normal three
bit destination code as described above.  Included in this class of operations are
LIST of 0 to 63 elements [see the discussion of destination NEXT-LIST above], all
pointer manipulation instructions of the form CxxxR and CxxxxR (the x's representing
either A or D for CAR or CDR), and a variety of lisp primitive functions such as
CONS, NCONS, GET, ASSOC, ASSQ and EQUAL.  These functions could of course be called
with the normal function calling sequence, but this method is more efficient.

→Address Only Instructions_
	Two of the sixteen possible main instruction codes are further decoded
from the destination field.  The sixteen operations provided in this way are:
.nofill

MOVEM	Stores the top of the stack into the effective address

SCDR	Replaces the contents of the effective address with its CDR

SCDDR	Replaces the contents of the effective address with its CDDR

1+	Replace the contents of the effective address with itself plus one

1-	Replace the contents of the effective address with itself minus one

>	Pop one argument from the top of the stack, compare with the
<	contents of the effective address, and set the appropriate 
EQ	condition code indicators

ADD	These arithmetic operations take one operand from the stack and the
SUB	other from the effective address.  The result replaces the operand on
MUL	the stack.
DIV
REM	
.fill

→Branches_
	All data handling operations set two condition code bits as a result
of their execution.  For list manipulating instructions these indicators are the
ATOM indicator and the NIL indicator, and they are set or cleared according to
whether the result was atomic and null, respectively.  
	The branch instruction decodes the three bit destination field as
a branch condition.  The conditions include an unconditional branch, either
state of the two condition code bits, and two branch conditions which test the
NIL indicator and, if it is set, pop the stack in addition to performing the
normal conditional branch.  This operation is useful in compiling the end
test of functions which examine sequential elements of a list.

.PAGE
→Sample Compiled Macrocode Sequences_

	We now present some simple examples of lisp functions with the
corresponding macro compiled instructions.  For comparison of relative bit
efficiencies, we also include the PDP-10 compiled version, produced by the
MacLisp number compiler.
.nofill

(defun add (x y)(+ x y))	;the trivial call to a built in function

Macro code

MOVE	D-PDL    ARG|0		;FIRST ARGUMENT (X)
+		 ARG|1		;SECOND ARGUMENT (Y)
				;OTHER ARGUMENT ON THE STACK, RESULT TO STACK
MOVE	D-RETURN SPECIAL|77	;RETURN THE TOP ELEMENT OF THE STACK,
				;AND POP THE STACK POINTER

.page
(defun fact (n)(cond ((= n 1) 1)(t (* n (fact (1- n))))))
				;simple recursive functional call


MOVE   D-PDL	ARG|0		;FIRST ARGUMENT TO STACK
EQ		FEF|(QUOTE 1)	;COMPARE WITH A QUOTED CONSTANT IN THE FUNCTION ENTRY FRAME
BRANCH NILIND-TRUE   MORE	;BRANCH IF THE RESULT IS NIL
MOVE   D-RETURN	FEF|(QUOTE 1)	;RETURN QUOTED CONSTANT STORED IN FUNCTION ENTRY FRAME

MORE:
MOVE   D-PDL    ARG|0		;FIRST ARGUMENT TO STACK
CALL   D-PDL    FEF|(*FUNCELL FACT)	;SETUP TO CALL FUNCTION, POINTER TO FUNCTION
					;CELL LOCATION IS STORED IN FEF
MOVE   D-PDL    ARG|0		;ARGUMENT TO STACK
1-     D-LAST			;SUBTRACT ONE FROM TOP OF STACK, INITIATE RECURSIVE
				;FUNCTION CALL
*      		SPECIAL|77	;MULTIPLY TOP OF STACK (RETURNED RECURSIVE FUNCTION VALUE)
				;WITH SECOND ELEMENT OF STACK (ARGUMENT OF THIS FUNCTION)
				;RESULT TO STACK
MOVE   D-RETURN SPECIAL|77	;RETURN TOP OF STACK

total instructions: 10		total bits:  160


PDP-10 MacLisp Compiled output

	PUSH P,1		;ARGUMENT TO STACK
	MOVE 7,(1)		;NUMERIC VALUE OF ARGUMENT
	SOJN 7,G0002		;SUBTRACT ONE; IF NON-ZERO, GOTO G0002
	MOVEI 1,(QUOTE 1)	;PICKUP POINTER TO QUOTED VALUE
	JRST G0001		;TRANSFER TO EXIT

G0002:	MOVE 7,(1)		;PICK UP NUMERIC VALUE OF ARGUMENT
	SUBI 7,1		;SUBTRACT ONE
	PUSH FXP,7		;PUSH ONTO THE INTEGER STACK
	MOVEI 1,(FXP)		;PICK UP A POINTER TO THE TOP OF THE INTEGER STACK
	CALL 1,(QUOTE FACT)	;CALL RECURSIVELY USING THIS POINTER AS AN ARGUMENT
	MOVE 7,@(P)		;PICKUP NUMERIC VALUE OF ARGUMENT
	IMUL 7,(1)		;MULTIPLY BY NUMERIC VALUE OF RETURNED DATA
	JSP T,FXCONS		;BOX THE RESULT, RETURNING A POINTER IN AC1
	SUB FXP,[1,,1]		;CLEANUP AND POP THE TWO STACKS
G0001:	SUB P,[1,,1]
	POPJ P,			;EXIT

total instructions: 16		total bits:  576

bit efficiency ratio:	3.6
.page
;A simple iterative function definition
(defun sum (x)(prog (result)
		(setq result 0)
	a	(cond ((null x)(return result))(t (setq result (+ result (car x)))
						  (setq x (cdr x))
						  (go a)))))

MOVE	D-PDL	FEF|(QUOTE 0)		;MOVE A QUOTED CONSTANT ONTO THE STACK
POP		LOCBLOCK|0		;STORE IN A LOCAL VARIABLE (RESULT)

A:
MOVE	D-IGNORE ARG|0			;TEST FIRST ARGUMENT (X)
BRANCH	NILIND-FALSE MORE		;BRANCH IF NON-NIL
MOVE 	D-RETURN LOCBLOCK|0		;RETURN RESULT

MORE:	
MOVE	D-PDL	LOCBLOCK|0		;PUSH LOCAL VARIABLE (RESULT) ONTO STACK
CAR	D-PDL	ARG|0			;PUSH CAR OF THE FIRST ARGUMENT ONTO STACK
+		SPECIAL|77		;ADD THE TOP TWO STACK ELEMENTS, RESULT TO STACK
POP		LOCBLOCK|0		;STORE TOP OF STACK IN LOCAL VARIABLE (RESULT)
SETE-CDR	ARG|0			;REPLACE THE ARGUMENT WITH ITS CDR
BRANCH	ALWAYS	A			;CLOSE THE LOOP

.PAGE
.fill
	An examination of the macro compiled code shown above will probably convince
you that this particular macro instruction set is not the ultimate in bit efficiency
for compiled lisp representation.  Other bit efficient architectures have been described
using eight bit instructions [Deutsch].  The tradeoffs come down to decisions on the
complexity of the compiler construction and the ease of thinking about and implementing
micro coded interpreters for these architectures.  The beauty of a writable microcode
processor is that this sort of investigation and tuning of the macro coded architecture
is possible without change to the hardware.

→Summary_

	We have attempted to demonstrate the increase in the complexity of the
interpretation of the macro instruction set as a result of the optimization of that
architecture to the needs of a bit efficient representation for the compiled lisp
function.  The complexity is introduced both in the execution of such features
in the instruction set as the DESTINATION-LAST and DESTINATION-RETURN tags, and
in the interpretation of the typed data, with its run time datatype checking and
handling of complex storage conventions such as the CDR-CODE fields and the 
forwarding datatypes.  It should also be mentioned that a fair number of the
miscellaneous functions in the macro instruction set perform very complex operations
such as storage allocation.  The next chapter describes the architecture of the 
microprogrammable processor we have designed to implement this architecture.
βββ